Regular Expression

Regular expressions (called REs, or regexes, or regex patterns) are essentially a tiny, highly specialized programming language embedded inside Python and made available through the re module. Python has a built-in package called re, which can be used to work with Regular Expressions.

Import the re module: 

import re 

 RegEx Functions

The re module offers a set of functions that allows us to search a string for a match:
Function     Description

  • findall - Returns a list containing all matches
  • search - Returns a Match object if there is a match anywhere in the string
  • split - Returns a list where the string has been split at each match
  • sub - Replaces one or many matches with a string

  •   \   Used to drop the special meaning of character following it (discussed below).
  •   []  Represent a character class
  •   ^  Matches the beginning
  •   $   Matches the end
  •   .   Matches any character except newline
  •   ?   Matches zero or one occurrence.
  •   |   Means OR (Matches with any of the characters separated by it. 
  •   *   Any number of occurrences (including 0 occurrences)
  •  +   One or more occurrences
  • {}  Indicate number of occurrences of a preceding RE to match.
  • ()  Enclose a group of REs

 

# Module Regular Expression is imported using __import__().
import re

# compile() creates regular expression character class [a-e],
# which is equivalent to [abcde].
# class [abcde] will match with string with 'a', 'b', 'c', 'd', 'e'.
p = re.compile('[a-e]')

# findall() searches for the Regular Expression and return a list upon finding
print(p.findall("Regular expressions are essentially a tiny, highly specialized "
"programming language embedded inside Python and made available "
"through the re module")) 

Function compile() 
Regular expressions are compiled into pattern objects, which have methods for various operations such as searching for pattern matches or performing string substitutions. 

import re

# \d is equivalent to [0-9].
p = re.compile('\d')
print(p.findall("I went to him at 11 A.M. on 4th July 1886"))

# \d+ will match a group on [0-9], group of one or greater size
p = re.compile('\d+')
print(p.findall("I went to him at 11 A.M. on 4th July 1886"))




No comments:

Post a Comment